随机微分方程(SDE)用于描述各种复杂的随机动力学系统。学习SDE中的隐藏物理学对于揭示对这些系统的随机和非线性行为的基本理解至关重要。我们提出了一个灵活且可扩展的框架,用于训练人工神经网络,以学习代表SDE中隐藏物理的本构方程。所提出的随机物理学的神经普通微分方程框架(Spinode)通过已知的SDE结构(即已知的物理学)传播随机性,以产生一组确定性的ODE,以描述随机状态的统计矩的时间演变。然后,Spinode使用ODE求解器预测矩轨迹。 Spinode通过将预测的矩与从数据估计的矩匹配来学习隐藏物理的神经网络表示。利用了自动分化和微型批次梯度下降的最新进展,并利用了伴随灵敏度,以建立神经网络的未知参数。我们在三个基准内案例研究上展示了Spinod,并分析了框架的数值鲁棒性和稳定性。 Spinode提供了一个有希望的新方向,用于系统地阐明具有乘法噪声的多元随机动力学系统的隐藏物理。
translated by 谷歌翻译
The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., decrease as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error (from an ontology of 7 types) is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) BUMP enables measuring the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, 3) BUMP enables the measurement of metrics' performance on individual error types and highlights areas of weakness for future work.
translated by 谷歌翻译
Objective: Evictions are involved in a cascade of negative events that can lead to unemployment, homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction incidences and their attributes from electronic health record (EHR) notes. Materials and Methods: We annotated eviction status in 5000 EHR notes from the Veterans Health Administration. We developed a novel model, called Knowledge Injection based on Ripple Effects of Social and Behavioral Determinants of Health (KIRESH), that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and Bio_ClinicalBERT. Moreover, we designed a prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt achieved a Macro-F1 of 0.6273 (presence) and 0.7115 (period), which was significantly higher than 0.5382 (presence) and 0.67167 (period) for just fine-tuning Bio_ClinicalBERT model. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. In future work, we will evaluate the generalizability of the model framework to other applications.
translated by 谷歌翻译
Pretrained transformer models have achieved state-of-the-art results in many tasks and benchmarks recently. Many state-of-the-art Language Models (LMs), however, do not scale well above the threshold of 512 input tokens. In specialized domains though (such as legal, scientific or biomedical), models often need to process very long text (sometimes well above 10000 tokens). Even though many efficient transformers have been proposed (such as Longformer, BigBird or FNet), so far, only very few such efficient models are available for specialized domains. Additionally, since the pretraining process is extremely costly in general - but even more so as the sequence length increases - it is often only in reach of large research labs. One way of making pretraining cheaper is the Replaced Token Detection (RTD) task, by providing more signal during training, since the loss can be computed over all tokens. In this work, we train Longformer models with the efficient RTD task on legal data to showcase that pretraining efficient LMs is possible using much less compute. We evaluate the trained models on challenging summarization tasks requiring the model to summarize long texts to show to what extent the models can achieve good performance on downstream tasks. We find that both the small and base models outperform their baselines on the in-domain BillSum and out-of-domain PubMed tasks in their respective parameter range. We publish our code and models for research purposes.
translated by 谷歌翻译
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.
translated by 谷歌翻译
Deep learning-based pose estimation algorithms can successfully estimate the pose of objects in an image, especially in the field of color images. 6D Object pose estimation based on deep learning models for X-ray images often use custom architectures that employ extensive CAD models and simulated data for training purposes. Recent RGB-based methods opt to solve pose estimation problems using small datasets, making them more attractive for the X-ray domain where medical data is scarcely available. We refine an existing RGB-based model (SingleShotPose) to estimate the 6D pose of a marked cube from grayscale X-ray images by creating a generic solution trained on only real X-ray data and adjusted for X-ray acquisition geometry. The model regresses 2D control points and calculates the pose through 2D/3D correspondences using Perspective-n-Point(PnP), allowing a single trained model to be used across all supporting cone-beam-based X-ray geometries. Since modern X-ray systems continuously adjust acquisition parameters during a procedure, it is essential for such a pose estimation network to consider these parameters in order to be deployed successfully and find a real use case. With a 5-cm/5-degree accuracy of 93% and an average 3D rotation error of 2.2 degrees, the results of the proposed approach are comparable with state-of-the-art alternatives, while requiring significantly less real training examples and being applicable in real-time applications.
translated by 谷歌翻译
先前的工作表明,语言模型(LMS)的大小(LMS)与它们在不同下游NLP任务上的零拍摄性能之间存在缩放定律。在这项工作中,我们表明,在用否定提示的任务评估大型LM时,这种现象并不存在,而是显示了逆缩放定律。我们对(1)验证的LMS(OPT&GPT -3)的否定提示评估了9个不同的任务,该任务的不同尺寸(125m -175b),(2)LMS进一步预处理以推广到新颖的提示(指令),(3)提供的LMS,(3)LMS。示例很少,(4)LMS专门针对否定的提示进行了微调;所有LM类型在否定的提示上的表现较差,并在比较原始提示和否定提示的平均得分时显示人类绩效之间的巨大性能差距。通过强调现有LMS和方法的关键局限,我们敦促社区开发开发实际遵循给定指示的LMS的新方法。我们提供代码和数据集,以探索https://github.com/joeljang/negated-prompts-for-llms的否定提示。
translated by 谷歌翻译
近年来,变形金刚的体系结构在受欢迎程度上一直在越来越流行。调制检测变压器(MDETR)是一个端到端的多模式理解模型,该模型执行诸如相位接地,引用表达理解,参考表达分割和视觉问题答案之类的任务。该模型的一个了不起的方面是可以推断出以前未经培训的类别的能力。在这项工作中,我们探讨了MDETR在一项新任务中的使用,即动作检测,没有任何以前的培训。我们使用原子视觉动作数据集获得定量结果。尽管该模型没有报告任务中的最佳性能,但我们认为这是一个有趣的发现。我们表明,可以使用多模式模型来解决其设计不适合的任务。最后,我们认为,这一研究可能导致MDETR在其他下游任务中的概括。
translated by 谷歌翻译
随着卷积神经网络(CNN)在物体识别方面变得更加准确,它们的表示与灵长类动物的视觉系统越来越相似。这一发现激发了我们和其他研究人员询问该含义是否也以另一种方式运行:如果CNN表示更像大脑,网络会变得更加准确吗?以前解决这个问题的尝试显示出非常适中的准确性,部分原因是正则化方法的局限性。为了克服这些局限性,我们开发了一种新的CNN神经数据正常化程序,该数据正常化程序使用深层规范相关分析(DCCA)来优化CNN图像表示与猴子视觉皮层的相似之处。使用这种新的神经数据正常化程序,与先前的最新神经数据正则化器相比,我们看到分类准确性和少级精度的性能提高得多。这些网络对对抗性攻击也比未注册的攻击更强大。这些结果共同证实,神经数据正则化可以提高CNN的性能,并引入了一种获得更大性能提升的新方法。
translated by 谷歌翻译
仿真是用于创建控制策略和测试各种物理参数的机器人技术的重要步骤。 Soft Robotics是一个领域,由于可变形材料组件的非线性以及其他创新性且通常是复杂的物理特性而引起了独特的物理挑战,以模拟其主题。由于使用传统技术模拟柔软和异质物体的计算成本,刚性机器人模拟器不太适合模拟软机器人。因此,许多工程师必须构建自己为系统量身定制的一次性模拟器,或使用具有降低性能的现有模拟器。为了促进这项激动人心的技术的开发,这项工作为各种软机器人提供了交互式,准确和多功能的模拟器。我们的开源3D仿真引擎Cronos与可变形和刚性对象的超快速性能的质量弹簧模型平行。我们的方法适用于多种非线性材料构型,包括高变形性,体积致动或异质刚度。这种多功能性提供了在单个机器人模拟中自由混合材料和几何成分的能力。通过利用非线性胡克恩质量弹簧系统的灵活性和可扩展性,该框架通过高度并行模型模拟柔软而刚性的对象,以实现近实时速度。我们描述了有效的GPU CUDA实施,我们证明了该实施是为了在消费级GPU卡上实现每秒超过10亿个元素的计算。通过将结果与Euler-Bernoulli光束理论,固有频率预测和软结构在大变形下的软结构进行比较来验证系统的动态物理准确性。
translated by 谷歌翻译